Hierarchy Identification for Automatically Generating Table-of-Contents

نویسندگان

  • Nicolai Erbs
  • Iryna Gurevych
  • Torsten Zesch
چکیده

A table-of-contents (TOC) provides a quick reference to a document’s content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textual features instead of structural hints e.g. from HTML-tags. We create two new datasets to evaluate our approaches for hierarchy identification. We find that our algorithm performs on a level that is sufficient for a fully automated system. For documents without given segment titles, we extend our work by automatically generating segment titles. We make the datasets and our experimental framework publicly available in order to foster future research in TOC generation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visualizing websites using a hierarchical table of contents browser: WebTOC

A method is described for visualizing the contents of a Web site with a hierarchical table of contents using a Java program and applet called WebTOC. The automatically generated expand/contract table of contents provides graphical information indicating the number of elements in branches of the hierarchy as well as individual and cumulative sizes. Color can be used to represent another attribut...

متن کامل

RFID-based decision support within maintenance management of urban tunnel systems

Efficiently, tracking information related to components, materials and equipment from the production/construction phase to operation and maintenance is a challenge in the industries. The industry environment is a natural fit for generating and utilizing instance-level data for decision support. Advanced electronic identification and data storage technologies e.g. radio frequency identification ...

متن کامل

RFID-based decision support within maintenance management of urban tunnel systems

Efficiently, tracking information related to components, materials and equipment from the production/construction phase to operation and maintenance is a challenge in the industries. The industry environment is a natural fit for generating and utilizing instance-level data for decision support. Advanced electronic identification and data storage technologies e.g. radio frequency identification ...

متن کامل

A Semi-supervised Approach for Generating a Table-of-Contents

This paper presents a semi-supervised model for generating a table-of-contents as an indicative summarization. We mainly focus on using word cluster-based information derived from a large amount of unannotated data by an unsupervised algorithm. We integrate word cluster-based features into a discriminative structured learning model, and show that our approach not only increases the quality of t...

متن کامل

Identification and Prioritization of Factors Affecting E-teacher’s Performance based on Fuzzy Analytic Hierarchy Process (AHP)

Introduction: Information and communication technology has changed the traditional role of teachers and learners. It’s important to detect factors influencing e-teachers’ role to improve their performance more than ever. So in this study we identified and prioritized factors influencing e-teacher’s effective performance. Methods: In this descriptive study, 15 e-learning experts from Tehran Uni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013